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Abstract 

In this paper, we revisit the classic problem of run generation. Run generation is the first phase of 
external-memory sorting, where the objective is to scan through the data, reorder elements using a small 
buffer of size M, and output runs (contiguously sorted chunks of elements) that are as long as possible. 

We develop algorithms for minimizing the total number of runs (or equivalently, maximizing the 
average run length) when the runs are allowed to be sorted or reverse sorted. We study the problem in the 
online setting, both with and without resource augmentation, and in the offline setting. 

• We analyze alternating-up-down replacement selection (runs alternate between sorted and reverse 
sorted), which was studied by Knuth as far back as 1963. We show that this simple policy is 
asymptotically optimal. Specifically, we show that alternating-up-down replacement selection is 
2-competitive and no deterministic online algorithm can perform better. 

• We give online algorithms having smaller competitive ratios with resource augmentation. Specifi¬ 
cally, we exhibit a deterministic algorithm that, when given a buffer of size AM, is able to match or 
beat any optimal algorithm having a buffer of size M. Furthermore, we present a randomized online 
algorithm which is 7/4-competitive when given a buffer twice that of the optimal. 

• We demonstrate that performance can also be improved with a small amount of foresight. We give 
an algorithm, which is 3/2-competitive, with foreknowledge of the next 3M elements of the input 
stream. For the extreme case where all future elements are known, we design a PTAS for computing 
the optimal strategy a run generation algorithm must follow. 

• We present algorithms tailored for “nearly sorted” inputs which are guaranteed to have optimal 
solutions with sufficiently long runs. 


^Department of Computer Science, Stony Brook University, Stony Brook, NY 11794-4400, 
Email: {bender, smccauley, shiksingh}@cs . stonybrook . edu. 

^Department of Computer Science, University of Massachusetts, Amherst, MA 01003, 
Email: {mcgregor, hvu}@cs . umass . edu. 


USA. 

USA. 



1 Introduction 


External-memory sorting algorithms are tailored for data sets too large to fit in main memory. Generally, 
these algorithms begin their sort by bringing chunks of data into main memory, sorting within memory, and 
writing back out to disk in sorted sequences, called runs | fT5| 2^ 341. 

We revisit the classic problem of how to maximize the length of these runs, the run-generation problem. 
The run-generation problem has been studied in its various guises for over 50 years | [T4l[T7| - [T9)[25||^ 3T] 34) . 

The most well-known external-memory sorting algorithm is multi-way merge sort l|T]|^|T5 ^ ^ ^ ^ 
42 44 1. The multi-way merge sort is formalized in the disk-access machin^ {DAM) model of Aggarwal 


and Vitter l|T|. If M is the size of RAM and data is transferred between main memory and disk in blocks of 
size B, then an M/B-way merge sort has a complexity of 0[{N/B) log^/^ {N/B)) I/Os, where N is the 
number of elements to be sorted. This is the best possible l|T|. 

A top-down description of multi-way merge sort follows. Divide the input into M/B subproblems, 
recursively sort each subproblem, and merge them together in one final scan fhrough fhe inpuf. The base case 
is reached when each subproblem has size 0{M), and Iherefore fifs info RAM. 

A boffom-up description of fhe algorifhm sfarfs wifh fhe base case, which is fhe run-generafion phase. 
Naively, we can always generate runs of lengfh M: ingesf M elemenfs info memory, sorf fhem, wrife fhem fo 
disk, and fhen repeal. 

The poinl of run generafion is lo produce runs longer lhan M. After all, wifh lypical values of N and 
M, we rarely need more lhan one or Iwo passes over fhe dala afler fhe initial run-generafion phase. Longer 
runs can mean fewer passes over fhe dafa or less memory consumplion during fhe merge phase of fhe sorf. 
Because Ihere are few scans lo begin wifh, even if we only do one fewer scan, fhe cosl of a merge sort is 
decreased by a significanf percenfage. Run generafion has furlher advanlages in dalabases even when a full 
sorf is nol required 


Replacement Selection. The classic algorithm for run generation is called replacement selection | |20p6p8| . 
We describe replacement selection below by assuming that the elements can be read into memory and written 
to disk one at a time. 

To create an increasing run starting from an initially full internal memory, proceed as follows: 

1. From main memory, select the smallest elemenj^at least as large as every element in the current run. 

2. If no such element exists, then the run ends; select the smallest element in the buffer. 

3. Eject that element, and ingest the next element, so that the memory stays full. 

Replacement selection can deal with input elements one at a time, even though the DAM model transfers 
input between RAM and disk B elements at a time. To see why, consider two additional blocks in memory, 
an “input block,” which stores elements recently read from disk, and an “output block,” which stores elements 
that have already been placed in a run and will be written back to disk. To ingest, take an element from the 
input block, and to eject an element, put the element in the output block. When the input block becomes 
empty, till it from disk and when the output block fills up, flush it to disk. Similar to previous work, in this 
paper, we ignore these two blocks. 


Properties of Replacement Selection. It has been known for decades that when the input appears in random 
order, then the expected length of a run is actually 2M, not M lT^T^|^. In | |2^ , Knuth gives memorable 
intuition about this result, conceptualizing the buffer as a snowplow traveling along a circular track. 

Replacement selection performs particularly well on nearly sorted data (for many intuitive notions of 
“nearly”), and the runs generated are much larger than M. For example, when each element in the input 
appears at a distance at most M from its actual rank, replacement selection produces a single run. 


*The external-memory model, also called the I/O model, applies to any two levels of the memory hierarchy. 

^Observe that data structures such as in-memory heaps can be used to identify the smallest elements in memory. However, from 
the perspective of minimizing I/Os, this does not matter—computation is free in the DAM model. 
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On the other hand, replacement selection performs poorly on reverse-sorted data. It produces runs of 
length M, which is the worst possible. 

Up-Down Replacement Selection. From the perspective of the sorting algorithm, it matters little, or not at 
all, whether the initially generated runs are sorted or reverse sorted. 

This observation has motivated researchers to think about run generation when the replacement-selection 
algorithm has a choice about whether to generate an up run or a down run, each time a new run begins. 


Knuth 1251 analyzes the performance of replacement selection that alternates deterministically between 
generating up runs and down runs. He shows that for randomly generated data, this alternative policy performs 
worse, generating runs of expected length 3M/2, instead of 2M. 

Martinez-Palau et al. revive this idea in an experimental study. Their two-way-replacement-selection 
algorithms heuristically choose between whether the run generation should go up or down. Their experiments 
find that two-way replacement selection (1) is slightly worse than replacement selection for random input (in 


accordance with Knuth |251) and (2) produces significantly longer runs on inputs that have mixed up-down 
runs and reverse-sorted inputs. 

Our Contributions. The results in our present paper complement these earlier results. In contrast to Knuth’s 


negative result for random inputs |251, we show that strict up-down alternation is best possible for worst-case 
inputs. Moreover, we give better competitive ratios with resource augmentation and lookahead, which helps 
explain why heuristically choosing between up and down runs based on what is currently in memory may lead 
to better solutions. Resource augmentation is a standard tool used in competitive analysis |[^ 11 - [T3|^ 39| 
to empower an online algorithm when comparing against an omniscient and all-powerful optimal algorithm. 

Up-down run generation boils down to figuring ouf, each fime a run ends, whefher fhe nexf run should be 
an up run or a down run. The objecfive is fo minimize fhe number of runs oufpul0We esfablish fhe following: 

1 . Analysis of alternating-up-down replacement selection. We revisif (online) alfemafing-up-down replace- 
menf selecfion, which was earlier analyzed by Knufh | |25| . We prove fhaf alfemafing-up-down replacemenf 
selecfion is 2-compefifive and asympfofically optimal for deferminisfic algorifhms. To puf fhis resulf in 
confexf, if is known fhaf up-only replacemenf selecfion is a consfanf facfor beffer fhan up-down replace¬ 
menf selecfion for random inpufs, buf can be an unbounded facfor worse fhan opfimal for arbifrary inpufs. 

2. Resource augmentation with extra buffer. We analyze fhe effecf of augmenfing fhe buffer available fo an 
online algorifhm on ifs performance. We show fhaf wifh a consfanf facfor larger buffer, if is possible fo 
perform heller fhan Iwice opfimal. Specifically, we exhibil a deterministic algorifhm fhaf, when given a 
buffer of size 4M, malches or heals any optimal algorifhm having a buffer of size M. We also design a 
randomized online algorifhm which is 7/4-competifive using a 2M-size buffer. 

3. Resource augmentation with extra visibility. We show fhaf performance faclors can also be improved, 
wilhoul augmenting fhe buffer, if an algorifhm has limited foreknowledge of fhe inpul. In particular, we 
propose a deferminisfic algorifhm which attains a compelifive ratio of 3/2, using ifs regular buffer of size 
M, wifh a lookahead of 3M incoming elemenfs of fhe inpul (al each step). 

4. Better bounds for nearly sorted data. We give algorifhms fhaf perform well on inpufs fhaf have some 
inherenf sorledness. We show fhaf fhe greedy offline algorifhm is opfimal for inpufs on which fhe opfimal 
runs are al leasl 5M elemenfs long. We also give a 3/2-compelifive algorifhm wifh 2M-size buffer when 
fhe opfimal runs are al leasl 3M long. These resulls are reminiscenl of previous lileralure sludying sorting 
on inpufs wifh “bounded disorder” 1101 and adaptive sorting algorifhms | T^3^[4T| . 

5. PTAS for the offline problem. We give a polynomial-lime approximation scheme for fhe offline run- 

generafion problem. Specifically, our offline polynomial-lime approximation algorifhm guarantees a 
(1 -)- e)-approximation fo fhe opfimal solufion. We firsl give an algorifhm wifh fhe running lime of 
0{2^/^N log N) and Ihen improve fhe running time fo O ^ N log . 


^Note that for a given input, minimizing the number of runs output is equivalent to maximizing the average length of runs output. 
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Paper Outline. The paper is organized as follows. In Section we formalize the up-down run generation 
problem and provide necessary notation. Section [^contains important structural properties of run generation 
and key lemmas used in analyzing our algorithms. Analysis of alternating-up-down replacement selection 
and online lower bounds are in Section Algorithms with resource augmentation, along with properties of 
the greedy algorithm, are presented in Section]^ The offline version of the problem is studied in Section]^ 
Improvements on well-sorted input are presented in Section]^ Section [^summarizes related work and we 
conclude with open problems in Section [^ Due to space constraints, we defer some proofs to the appendix 
(Appendix |A]). 

2 Up-Down Run Generation 

In this section, we formalize the up-down run generation problem and introduce notation. 

2.1 Problem Definition 

An instance of the up-down run generation problem is a stream / of elements. The elements of I are 
presented to the algorithm one by one, in order. They can be stored in the memory of size M available to 
the algorithm, which we henceforth refer to as the buffer. Each element occupies one slot of the buffer. In 
general, the model allows duplicate elements, although some results, particularly in Section]^ and Section]^ 
do require uniqueness. 

We say that an algorithm A reads an element of I when A transfers the element from the input sequence 
to the buffer. We say that an algorithm A writes an element when A ejects the element from its buffer and 
appends it to the output sequence S. 

Every time an element is written, its slot in the buffer becomes free. Unless stated otherwise, the next 
element from the input takes up the freed slot. Thus the buffer is always full, except when the end of the input 
is reached and there are fewer than M unwritten elements0 

An algorithm can decide which element to eject from its buffer based on (a) the current contents of the 
buffer and (b) the last element written. The algorithm may also use o(M) additional words to maintain its 
internal state (for example, it can store the direction of the current run). However, the algorithm cannot 
arbitrarily access S or I —it can only append elements to S, and access the next in-order element of I. We 
say the algorithm is at time step t if it has written exactly t elements. 

A run is a sequence of sorted or reverse-sorted elements. The cost of the algorithm is the smallest 
number of runs we can use to partition its output. Specifically, the number of runs in an output S, denoted 
R{S), is the smallest number of mutually disjoint sequences Si, S 2 , ■ ■ ■, Sj^^s) such that each Si is a run and 
S' = Si o • • • o Sij( s) where o indicates concatenation. 

We let OPT(/) be the minimum number of runs of any possible output sequence on input I, i.e., the 
number of runs generated by the optimal offline algorithm. If I is clear from context, we denote this as 
OPT. Our goal is to give algorithms that perform well compared to OPT for every I. We say that an online 
algorithm is (3-competitive if on any input, its output S satisfies R{S) < /30PT. 

At any time step, an algorithm’s unwritten-element sequence is comprised of the contents of the buffer, 
concatenated with the remaining (not yet ingested) input elements. Eor the purpose of this definition, we 
assume that the elements in the buffer are stored in their arrival order (their order in the input sequence I). 

Time step f is a decision point or decision time step for an algorithm A if f = 0 or if A finished writing a 
run at t. At a decision point, A needs to decide whether the next run will be increasing or decreasing. 

"'Reading in the next element of the input when there is a free slot in the buffer never hurts the performance of any algorithm. 
However, we allow the algorithm in the proof of Lemma[T^to maintain free slots in the buffer to simplify the analysis. 
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2.2 Notation 

We employ the following notation. We use {x y) to denote the inereasing sequenee x,x + \,x + 2,... ,y 
and (x \ y) to denote the deereasing sequenee x,x — l,x — 2 ... ,y. We use o to denote eoneatenation: if 
A = ai, 02 ,..., Ofc and B = bi,b 2 , ■ ■ ■, b^ then Ao B = ai,a 2 , ■ ■ ■ a^, bi,b 2 , ■ ■ ■, b^. 

Let yf = ai, 02 ,..., afc. We use yl © x to denote the sequenee oi + x, 02 + x,... o^ + x. Similarly, we 
use ^ © X to denote the sequenee oix, a 2 X,..., o^x. 

Let A, B be sequenees. We say A covers B if for all e G B,e G A. A subsequence of a sequenee 
A = oi,...,Ofc is a sequenee B = Om ,an 2 ,■■■ where 1 < ni < 02 < ... < < /c. 

3 Structural Properties 

In this seetion, we identify struetural properties of the problem and the tools used in the analysis of our 
algorithms, whieh will be important in the rest of the paper. 

3.1 Maximal Runs 

We show that in run generation, it is never a good idea to end a run early, and never a good idea to “skip over” 
an element (keeping it in buffer instead of writing it out as part of the eurrent run). 

To begin, we show that adding elements to an input sequenee never deereases the number of runs. Note 
that if S' is a subsequenee of S, then R{S') < R{S) by definition. 

Lemma 1. Consider two input streams I and I'. If I' is a subsequence of I, then OPT(/') < OPT(I). 

Proof Let A be an algorithm with input stream I and output S. Suppose that A produees the optimal number 
of runs on I, that is R{S) = OPT(/). Consider an algorithm A' on Algorithm A' performs the same 
operations as A, but when it reaehes an element that is not in I' (but is in I), it exeeutes a no-op. These 
no-ops mean that the buffer of A' may not be eompletely full, sinee elements that A has in buffer do not exist 
in the buffer of A'. Let S' be the output of A'; S' is a subsequenee of S. 

Then OPT(/') < R{S') < R{S) = OPT(/). □ 

A maximal increasing run is a run generated using the following rules (a maximal decreasing run is 
defined similarly): 

1. Start with the smallest element in the buffer and always write the smallest element that is larger than the 
last element written. 

2. End the run only when no element in the buffer ean eontinue the run, i.e., all elements in buffer are smaller 
than the last element written. 

Lemma 2. At any decision time step, a maximal increasing (decreasing) run r covers every other (non- 
maximal) increasing (decreasing) run r'. 

A proper algorithm is an algorithm that always writes maximal runs. We say an output is proper if it is 
generated by a proper algorithm. We show that there always exists an optimal proper algorithm. 

Theorem 3. For any input I, there exists a proper algorithm A with output S such that R{S) = OPT(/). 

Proof We prove this by induetion on the number of runs. If there is only one run, it must be maximal. 
Assume that all inputs R with OPT{It) = t have a maximal proper algorithm. Consider an input It+i with 
OPT(/t_|_i) = t + 1. Assume that an optimal algorithm on R+i is Aq, and it is not proper; we will eonstruet 
a proper A with the same number of runs. The first run A writes is maximal and has the same direetion of 
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the first run that Aq writes; the first run Aq writes may or may not be maximal. Then A is left with an 
unwritten-element sequence I a and Aq is left with Iq- Note that OPT(/o) = f by definition. 

By Lemma 1^ Iq is a subsequence of Ia- Then by Lemma OPT(/^) < OPT(/o)- Then by the 
inductive hypothesis, Ia has an optimal proper algorithm. Thus A is a proper algorithm generating the 
optimal number of runs. □ 

In conclusion, we have established that it always makes sense for an algorithm to write maximal runs. 
Furthermore, we use the following property of proper algorithms throughout the rest of the paper. 

Property 4. Any proper algorithm satisfies the following two properties: 

1. At each decision point, the elements of the buffer must have arrived while the previous run was being 
written. 

2. A new element can not be included in the current run if the element just written out is larger (smaller) 
and the current run is increasing (respectively, decreasing). 

3.2 Analysis Toolbox 

We now present observations and lemmas that play an integral role in analysing the algorithms presented in 
the rest of the paper. 

Observation 5. Consider algorithms Ai and A 2 on input I. Suppose that at time step ti algorithm Ai has 
written out all the elements that algorithm A 2 already wrote out by some previous time step ^ 2 - Then, the 
unwritten-element sequence of algorithm Ai at time step ti forms a subsequence of the unwritten-element 
sequence of algorithm A 2 at time step t 2 - 

Lemma 6. Consider a proper algorithm A. At some decision time step, A can write k runs pi o ■ ■ ■ o pf. or I 
runs Qio ■ ■ ■ o q£ such that \pi o ■ ■ ■ o pi^l > |qi o • • • o Then pi o ■ ■ ■ o p^ o pk+i< where Pk+i is either an 
up or down run, covers qi o ■ ■ ■ o q^. 

Therefore, the unwritten-element sequence after A writes pk+i (if A writes pio- ■ ■opj^.^i) is a subsequence 
of the unwritten-element sequence after A writes q^ (if A writes qio ■ ■ ■ o qi). 

Proof. Since \pio ■ ■ ■ opk\ > \qio ■ ■ ■ o qf\, the set of elements that are in qi o • • • o qi but not in pi o • • • o 
have to be in the buffer when pk ends. By Pk-vi will write all such elements. □ 

The next theorem serves as a template for analyzing the algorithms in this paper. It helps us restrict our 
attention to comparing the output of our algorithm against that of the optimal in small partitions. We show 
that if in every partition i, an algorithm writes Xi runs that cover the first yi runs of an optimal output (on the 
current unwritten-element sequence), and Xi/yi < /3, then the algorithm outputs no more than /30PT runs. 

Theorem 7. Let A be an algorithm with output S. Partition S into k contiguous subsequences , ^2 ... Sk- 
Let Xi be the number of runs in Si. For 1 < i < k, let li be the unwritten-element sequence after A outputs 
Si-i; let Ii = I and Ik+i = 0- Let a, jl > 1. For each f, let 5' be the output of an optimal algorithm on li. 

If for all i < k. Si covers the first yi runs of S[, and Xi/yi < j3, then R{S) < /30PT. Similarly, if for all 
i < k. Si covers the first yi runs of S[, and E[xj]/?/i < a, then E[i?(5)] < aOPT. 

Proof. Consider the unwritten element sequence at the end of the first y runs of S''_^ (we let = I). We 
show that OPT(/j) < OPT — Vi all 1 < f < A; using induction. Note that OPT(/i) = OPT (the 
base case). Induction hypothesis: assume OPT(/j) < OPT — X]}=i Vi- Since Sj+i covers the first y runs of 
by Ij+i is a subsequence of Then by LemmaOPT(Ii+i) < OPT(/'_|_^). By definition, for 

i > 1, 

i 

OPT(/'+i) = OPT(/i) - y^ < OPT-Y^yi. 

i=i 
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Therefore, OPT(/j_|_i) < OPT — Vi- When i = k,v/e have OPT(/fc_|_i) < OPT — Vi- since 
Ik+i contains no elements, OPT(/fc_|_i) = 0, and we have Yl^=i Vi — OPT. Since R{S) = 

Yli=i Xi< J2i=i Vi’ we have the following: 

R{S) = ■ OPT < ■ OPT < /30PT. 

Ei=i Vi 

We also have the same in expectation, that is, 

n n 

E[i?(5)] = E[^ Xi]<a^yi<a- OPT. 

i=l i=\ 

□ 


4 Up-Down Replacement Selection 


We begin by analyzing the alternating up-down replacement selection, which deterministically alternates be¬ 


tween writing (maximal) up and down runs. Knuth |25| showed that when the input elements arrive in a 
random order (all permutations of the input are equally likely), alternating-up-down replacement selection 
performs worse than standard replacement selection (all up runs). Specifically, he showed that the expected 
length of runs generated by up-down-replacement selection is 1.5M on random input, compared to the ex¬ 
pected length of 2M of replacement selection. 

In this section, we show that for deterministic online algorithms, alternating-up-down replacement selec¬ 
tion is, in fact, asymptotically optimal for any input. It generates at most twice the optimal number of runs in 
the worst case. This is the best possible—no deterministic algorithm can have a better competitive ratio. 


4.1 Alternating-Up-Down Replacement Selection is 2-competitive 

We begin by giving a structural lemma, analyzing identical runs on two inputs in which one input is a subse¬ 
quence of the other. 

Lemma 8 . Consider two inputs Ii and I2, where I2 is a subsequence of 1 1. Let Si and S2 be proper outputs 
of Ii and I2 such that: 

1 . Si and S2 have initial runs ri and r2 respectively, 

2 . ri and r2 have the same direction 

Let the unwritten-element sequence after ri and r2 be I'l and respectively. Then is a subsequence ofL[. 

Proof Assume that ri and r2 are up runs (a similar analysis works for down runs). Let be a run that is 
a subsequence of ri, consisting of all elements of ri that are also in I 2 . Then can be produced by an 
algorithm A' that mirrors the algorithm A that generates ri. When A reads or writes an element in L2, Af 
reads or writes that element; when A reads or writes an element not in I2, A' does nothing. Since r2 is 
maximal, it covers by Lemma|^ □ 

Theorem 9 . Alternating up-down replacement selection is 2 -competitive. 

Proof. We show that we can apply Theorem[7]to this algorithm with jS = 2 . 

In any partition that is not the last one of the output, the alternating algorithm writes a maximal up run 
and then writes a maximal down run r^. We must show that Vu o covers any run ro written by a proper 
optimal algorithm on Lr, the unwritten element sequence at the beginning of the partition. 
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If ro is an up run, then ro = and thus is covered by o r^. If ro is a down run, consider I', the 
unwritten-element sequence after ru is written; I' is a subsequence of Ir- By Lemma (with Ii = Ir and 
h = I'), Tu o rd covers tq- 

In the last partition, the algorithm can write at most two runs while any optimal output must contain at 
least one run. Hence Xi/in < 2 in all partitions as required. □ 

4.2 Lower Bounds on Online Algorithms for Up-Down Run Generation 

Now, we show that no deterministic online algorithm can hope to perform better than altemating-up-down 
replacement selection. Then, we partially answer the question of whether randomization helps overcome 
this impossibility result. Specifically, we show that no randomized algorithm can achieve a competitive ratio 
better than 3/2. We provide the main ideas of the proofs here and defer the details to Appendix [A] 

Theorem 10. Let A be any online deterministic algorithm with output Sj on input I. Then there are arbi¬ 
trarily long I such that R{Sj) > 20PT(/). 

Proof Sketch. Given any M elements in the buffer, every time A commits to a run direction (up/down), the 
adversary sets the incoming elements such that they do not help the current run. Thus, A is forced to have 
runs of length at most M while OPT (since it has knowledge of the future) can do better. □ 

We also give a lower bound for randomized algorithms using similar ideas; however, in this case we do 
not have a matching upper bound. We use Yao’s minimax principle to prove this bound. That is, we generate 
a randomized input and show that any deterministic algorithm cannot perform better than 3/2 times OPT on 
that input against an oblivious adversary. 

Theorem 11. Let A be any online, randomized algorithm. Then there are arbitrarily long input sequences 
such that ¥j[R{Si)\ > (3/2)OPT(/). 


5 Run Generation with Resource Augmentation 


In this section, we use resource augmentation to circumvent the impossibility result on the performance of 
deterministic online algorithms. We consider two kinds of augmentation: 

• Extra Buffer: The algorithm’s buffer is actually a constant factor larger, that is, it can use its large buffer 
to read elements from the input, rearrange them, and write to the output. 

• Extra Visibility: The algorithm’s buffer is restricted to be of size M but it has prescience—the algorithm 
can see some elements in the immediate future (say, the next 3M elements), without the ability to write 
them early. 

We present algorithms that, under the above conditions, achieve a competitive ratio better than 2 when com¬ 
pared against an optimal offline algorithm with a buffer of size M. 

Resource augmentation is a common tool used in competitive analysis |[^ 111^^ 391. It gives the 
online algorithm power to make better decisions and exclude worst case inputs, allowing us to compare the 
performance, more realistically, against an all-powerful offline optimal algorithm. 

The results in this section require the elements of the input to be unique. Duplicate elements can nullify 
the extra ability to see or write future (non-repeated) elements which is provided by visibility and buffer- 
augmentation respectively. For example, consider the input, 

I = (99,101,100, ...,100,...). 
cM-2 times 

On input I, any algorithm with cM-size buffer or visibility is as powerless as the one without any aug¬ 
mentation. 
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Note that the assumption of distinct elements in run generation is not new. Knuth’s analysis of the average 
run lengths [251 also requires uniqueness. 

We begin by analyzing the greedy algorithm for run generation. Greedy is a proper algorithm which 
looks into the future at each decision point, determines the length of the next up and down run and writes the 
longer run. 

Greedy is not an online algorithm. However, it is central to our resource augmentation results. The idea 
of resource augmentation, in part, is that the algorithm can use the extra buffer or visibility to determine, at 
each decision point, which direction (up or down) leads to the longer next run. 

We next look at some guarantees on the length of a run chosen by greedy (or the greedy run) and also on 
the run that is not chosen by greedy (or the non-greedy run). 


5.1 Greedy is Good but not Great 

We first show that greedy is not optimal. The following example demonstrates that greedy can be a factor of 
3/2 away from optimal. 

Example 12. Consider the input I = Ii o [Ii ® lOM) o (Ii 0 20M) o • • • o (/i © lOcM), where 

h = (4M + 4 /> 5 M + 3 ) o (M + 2 ) o (5M + 4 / 6 M + 3) 

o (2M + 1 Z' 3M - 1) o (4M + 3 \ 3M + 4) o (2M \ M + 3) o (M + 1 \ 1). 

On input I above, writing down runs repeatedly produces 2c runs; two for each I © zlOM. On the 
other hand, the output of greedy is 5i o [Si © lOM) o • • • o [Si © clOM), where Si = [4M + 4 
6M + 3) o (M + 2) o (2M + 1 /" 3M - 1) o (3M + 4 / 4M + 3) o (2M \ M + 3) o (M + 1 \ 1) 
which contains 3c runs. 

Next, we show that all the runs written by the greedy algorithm (except the last two) are guaranteed to 
have length at least 5M/4. In contrast, up-down replacement selection can have have runs of length M in the 
worst case. 

Theorem 13. Each greedy run, except the last two runs, has length at least M + [[M/2J /2]. 

We now bound how far into the future an algorithm must see to be able to determine which direction 
greedy would pick at a particular decision point. Intuitively, an algorithm should never have to choose be¬ 
tween a very long up run and a very long down run. We formalize this idea about the non-greedy run not 
being too long in the following lemma. 

Lemma 14. Given an input I with no duplicate elements. Let the two possible initial increasing and decreas¬ 
ing runs be ri and r 2 . Then |ri| < 3M or |r 2 | < 3M. 

The next example shows that the above bound is tight. 

Example 15. Consider the input I = Ii o I 2 o I^, where 

h =(1 (M - 1)) © M, l2 = (M^ \ - M + 1) 

I 3 =(M - 1 \ 1 ) o (M^ + 2 / + M + 1) . 

Then, 

n =((1 Z' (M - 1)) © M) o (M^ - M + lZ'M^ + M + l) 
r 2 =[M^ \ - M + 1) o ((M - 1 \ 1) © M) o (M - 1 \ 1). 

Thus, we have |ri| = 3M and |r 2 | = 3M — 1. 

The following lemma sheds some light on the choices made by an optimal algorithm with respect to that 
of greedy. It says, roughly, that if at any decision point, an optimal algorithm chooses to write the non-greedy 
run, and then writes the next run in the opposite direction, it performs no better than an optimal algorithm 
which chooses the greedy run in the first place. 



Lemma 16. At any decision time step consider two possible next maximal runs ri and r 2 . lf\ri\ > |r 2 |, then 
one of the following is the prefix of an optimal output on the unwritten-element sequence: 

1. ri o ra where is a maximal run after ri and it can be either up or down. 

2. r 2 o n where is maximal run after r 2 with the same direction of r 2 . 

5.2 Online Algorithms with Resource Augmentation 

We now present several online algorithms which use resource augmentation (buffer or visibility) to determine 
an up-down replacement selection strategy, beating the competitive ratio of 2. For a concise summary of 
results, see Figure [T] 

Matching OPT using 4iW-size Buffer. We present an algorithm with 4M-size buffer that writes no more 
runs than an optimal algorithm with an M-size buffer. Later on, we prove that (4M — 2)-size is necessary 
even to be 3/2-competitive; thus this augmentation result is optimal up to a constant. 

Consider the following deterministic algorithm with a 4M-size buffer. The algorithm reads elements until 
its buffer is full. It then uses the contents of its buffer to determine, for an algorithm with buffer size M, if 
the maximal up run or the maximal down run would be longer. If the maximal up run is longer, the algorithm 
uses its full buffer (of size AM) to write a maximal up run; otherwise it writes a maximal down run. The 
algorithm stops when there is no element left to write. 

Theorem 17. Let A be the algorithm with a AM-size buffer described above. On any input I, A never writes 
more runs than an optimal algorithm with buffer size M. 

Proof Sketch. At each decision point, A determines the direction that a greedy algorithm on the same unwrit¬ 
ten element sequence, but with a buffer of size M, would have picked. It is able to do so using its AM -size 
buffer because, by Lemma [T4| we know the length of the non-greedy run is bounded by 3M. Note that it does 
not need to write any elements during this step. In each partition, A writes a maximal run r in the greedy 
direction and thus covers the greedy run by Lemma|^ Furthermore, r covers the non-greedy run as well since 
all of the elements of this run must already be in A’s initial buffer and hence get written out. An optimal 
algorithm (with M-size buffer), on the unwritten-element-sequence, has to choose between the greedy and 
the non-greedy run. Since A covers both choices of the optimal in one run, by Theorem[7j it is able to match 
or beat OPT. □ 

A natural question is whether resource augmentation boosts performance automatically, without using 
the run-simulation technique. However, the following example shows that our 2-competitive algorithm, even 
when allowed to have 4M-size buffer, may still be as bad when using M-size buffer. 

Example 18. Consider the input, (8M \ 1) o (16M \ 8M -|- 1) o • • • o (8cM \ 8(c — 1)M -|- 1) . The 
alternating algorithm from Section \4.1\ which alternates maximal up and maximal down runs will write 2c 
runs given a AM-size buffer. In contrast, the optimal number of runs with an M-size buffer has c runs. 


3/2-competitive using 4iVf -visibility. When we say that an algorithm has A-visibility (X > M) or (A — 
M)-lookahead, it means that the algorithm has knowledge of the next A elements of its unwritten element 
sequence, and can use this knowledge when deciding what to write. 

However, only the usual M-size buffer is used for reading and writing. Furthermore, the algorithm must 
continue to read elements into its buffer sequentially from I, even if it sees elements further down the stream 
it would like to read or rearrange instead. 

We present a deterministic algorithm which uses 4M-visibility to achieve a competitive ratio of 3/2. At 
each decision point, similar to the algorithm in Theorem [T7| we can use 3M-lookahead to determine the 
direction leading to the longer (greedy) run. However, unlike Theorem we cannot use a large buffer to 
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Buffer size 

Lookahead 

Competitive ratio 

Comments 

M 

- 

2 

Deterministic 

2M 

- 

1.75 

Randomized 

M 

3M 

1.5 

Deterministic 

4M 

- 

1 

Deterministic 


Figure 1: Summary of online algorithms on run generation on any input 


write future elements. Instead, we do the following—write a maximal greedy run, followed by two additional 
maximal runs in the same direction and opposite direction respectively. 

We show that, at each decision point, the above algorithm is able to cover two runs of optimal (on the 
unwritten-element-sequence) using three runs. Lemma [T^ and Lemma are key in this analysis (see Ap- 
pendix[A|for details). Thus, we have the following. 

Theorem 19. Let OPT be the optimal number of runs on input I given an M-size buffer, where I has no 
duplicate elements. Then there exists an online algorithm A with an M-size buffer and 4:M-visibility such 
that A always outputs S satisfying R{S) < (3/2)OPT. 

7/4-competitive using 2M -size buffer. We have seen that it is possible to achieve a competitive ratio of 
3/2 using a standard M-size buffer as long as the algorithm is able to determine the direction leading to 
the longer (greedy) run (see Theorem [T^. Now we only have a 2M-size buffer. The algorithm will pick a 
direction randomly, and write a maximal run in that direction using its regular M buffer. It use the additional 
M-size buffer to simulate a run in the opposite direction (and thus figure out which one is longer). 

With probability 1/2, the algorithm is lucky and picks the greedy direction. In this case, we can cover 
the first two runs of optimal (on the unwritten-element sequence) with three runs as in Theorem [T^ With 
probability 1/2, the algorithm picks the wrong direction and we spend four (alternating) runs to cover two 
runs of optimal. Thus, in expectation we achieve a competitive ratio of 1/2(3/2) -|- 1/2(4/2) = 7/4. 

Theorem 20. Let OPT be the optimal number of runs on input L given an M-size buffer, where L has no 
duplicate elements. Then there exists an online algorithm A with a 2M-size buffer such that A always 
outputs S satisfying E[ii(5)] < (7/4)OPT and R{S) < 20PT. 


5.3 Lower Bound for Resource Augmentation 

We show that with less than (4M — 2)-augmentation, no deterministic online algorithm can be 3/2- 
competitive on all inputs. Thus, an algorithm with (4M — 2)-size buffer cannot be optimal, so Theorem[T7]is 
nearly tight. Similarly, Theorem[^is nearly tight, since 4M — 2-size buffer implies 4M — 2-visibility. 

Theorem 21. With buffer size less than (4M — 2), for any deterministic online algorithms A, there exists an 
input L such that if S is the output of A on L, then R{S) > (3/2) OPT. 

6 Offline Algorithms for Run Generation 

We give offline algorithms for run generation. The offline problem is the following—given the entire input, 
compute (using a standard polynomial computation time algorithm) the optimal strategy which when executed 
by a run generation algorithm (with a buffer of size M) produces the minimum possible number of runs. 

For any e, we provide an offline polynomial time approximation algorithm that gives a (1 -|- e)- 
approximation to the optimal solution. This is called a polynomial-time approximation scheme, or PTAS. The 
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running time of our first attempt is 0{2^^^NlogN). We then improve the running time to log N) 

where (p = {1 + y/~b)/2 k, 1.618 is the well-known golden ratio. 

Simple PTAS. Our first attempt breaks the output into sequences with a small number of runs, and uses brute 
force to find which set of runs writes the most elements. We show that for any e, v/e can achieve a 1 -|- e 
approximation in polynomial time using this strategy. 

Theorem 22. There exists an offline algorithm A that always writes an S satisfying R{S) < (1 -f e) • OPT. 
The running time of A is 0(2^/^iV log N). 

Improved PTAS. We reduce the running time by bounding the number of choices we need to consider in a 
brute-force search. We do this using Lemma [T^ 

At each decision point, an algorithm chooses between starting an increasing run ri and a decreasing run 
f' 2 - If kil > k 2 |^ then by Lemma [T^ we are able to discard r 2 followed by an increasing run. 

Let be the number of run sequences we need to consider if d runs remain to be written (for example, 
naive PTAS has = 2'^). First, the algorithm must handle all run sequences beginning with ri; this is the 
same as an instance of F^-i- Then the algorithm handles all run sequences beginning with r 2 followed by a 
decreasing run; this is an instance of Fd- 2 - Thus F^ = F^-i + Fd- 2 ', by examination, = 1 and F 2 = 2. 
This is the Fibonacci sequence, which gives us the factor in the running time. 

Theorem 23. There exists an offline algorithm A that writes S such that R{S) < (1 + e) • OPT. The running 
time of A is 0{p^^^N log N) where p is the golden ratio (1 -|- \/5)/2. 


7 Run Generation on Nearly Sorted Input 


This section presents results proving that up-down replacement selection performs better when the input has 
inherent sortedness (or “bounded-disorder” 134|). Replacement selection produces longer runs on nearly 
sorted data. In particular, if every input element is M away from its target position, then a single run is 
produced. Similarly, we give algorithms which perform well on inputs, where the optimal runs are also long. 

In particular, we say that an input is c-nearly-sorted if there exists a proper optimal algorithm whose 
outputs consists of runs of length at least cM. 


3/2-competitive using 2AT-size Buffer. We provide a randomized online algorithm that, on inputs which 
are 3-nearly-sorted, achieves a competitive ratio of 3/2, while using an augmented-buffer of size 2M. 

A sketch of the algorithm follows. At each decision point, the algorithm picks a run direction at random. 
It starts a maximal run in that direction, but uses its extra M-buffer to simulate the run in the opposite 
direction. By Lemma [T4l the algorithm can tell if it picked the same run as greedy (with M-buffer), similar 
to Theoremj^ If the algorithm got lucky and picked the greedy run, it repeats the process. 

If the algorithm picked the non-greedy run, it uses some careful bookkeeping to write elements and 
simulate the run in the opposite direction. In doing so, the algorithm winds up at the same point in the input 
it would have reached, had it written the greedy run in the first place, but with an additional cost of one run. 


Theorem 24. There exists a randomized online algorithm A using M space in addition to its buffer such that, 
on any 3-nearly-sorted input I that has no duplicates, A is a ‘3/2-approximation in expectation. Furthermore, 
A is at worst a 2-approximation regardless of its random choices. 


Exact Offline Algorithm on Nearly Sorted Input. We show that the greedy (offline) algorithm is a linear 
time optimal algorithm on inputs which are 5-nearly-sorted. We first prove the following lemma. 

Lemma 25. If a proper algorithm produces runs of length at least 5M on a given input with no duplicates, 
then it is optimal. 
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Thus, we get our required linear time exact offline algorithm. 

Theorem 26. The greedy offline algorithm, i.e., picking the longer run at each decision point, is optimal on 
a b-nearly-sorted input that contain no duplicates. The running time of the algorithm is 0{N). 

8 Additional Related Work 


Replacement Selection. The classic algorithm for run generation is replacement selection p0| . While 
replacement selection considers only up runs, Knuth 1251 analyzed alternating up-down replacement selection 
in 1963. He showed that for uniformly random input, alternating up-down replacement selection produces 


runs of expected length 3M/2, compared to 2M of the standard replacement selection 118 19 26). 


Recently, Martinez-Palau et al. |34| introduce Two-way replacement selection (2WRS), reviving the idea 
of up-down replacement selection. The 2WRS algorithm maintains two heaps in memory for up and down 
runs and heuristically decides in which heap each element must be placed. Their simulations show that 2WRS 
performs significantly better on inputs with mixed up-down, alternating up-down, and random sequences. 

Replacement selection with a fixed-sized reservoir appears in p7l[40[ . Larson |28| introduced batched 
replacement selection, a cache-conscious replacement selection which works for variable-length records. 
Koltsidas, Muller, and Viglas |[27| study replacement selection for sorting hierarchical data. 


Improvements for the merge phase of external sorting have been considered in pO|[T5|[37y43 [|44) , but this 
is beyond the scope of this paper. 

Reordering Buffer Management. Run generation problem is reminiscent of the buffer reordering problem 


(also known as the sorting buffer problem), introduced by Racke et al. |36|. It consists of a sequence of 
n elements that arrive over time, each having a certain color. A buffer, that can store up to k elements, is 
used to rearrange them. When the buffer becomes full, an element must be output. A cost is incurred every 
time an element is output that has a color different from the previous element in the output sequence. The 
goal is to design a scheduling strategy for the order in which elements must be output, so as to minimize the 
total number of color changes. The buffer reordering problem models a number of important problems in 
manufacturing processes and network routing and has been extensively studied, both in the online and offline 
case p|-|7l|^ |T2][23) . The offline version of the buffer reordering problem is NP hard 0, while the complexity 
of our problem remains unresolved. 

Patience Sort and Longest Increasing Subsequence. An old sorting technique used to sort decks of playing 


cards. Patience Sort \ 331 has two phases—the creation of sorted piles or runs, and the merging of these runs. 


The elements arrive one at a time and each one can be added to an existing run or starts a new run of its 
own. Unlike this paper, a legal run only consists of elements decreasing in value, and patience sort can form 
any number of parallel runs. The goal is to minimize the number of runs. The greedy strategy of placing an 
element to the left-most legal run is optimal. Moreover, the minimum number of such runs is the length of the 
longest increasing subsequence of the input Q. Patience sort has been studied in the streaming model |211. 

Similar to Replacement Selection, Patience Sort is able to leverage partially sorted input data. Chan- 
dramouli and Goldstein | [T0) present improvements to patience sort, and combine it with replacement selec¬ 
tion to achieve practical speed up. 


Adaptive Sorting Algorithms. Python’s inbuilt sorting algorithm, Timsort |411 works by finding contiguous 


runs of increasing or decreasing value during the run generation phase. External memory sorting for well- 


ordered or “partially sorted” data has been studied by Liu et al. |32|. They minimize the I/O cost of run 


generation phase by finding “naturally occurring runs”. See 1161 for a survey on adaptive sorting algorithms. 
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9 Conclusion and Open Problems 


In this paper, we present an in-depth analysis of algorithms for run generation. We establish that considering 
both up and down runs can substantially reduce the number of runs in an external sort. The notion of up- 
down replacement selection has received relatively little attention since Knuth’s negative result |251, until its 
promise was acknowledged by the experimental work of Martinez-Palau et al. p^. 


The results in our paper complement the findings of Knuth |251 and Martinez-Palau et al. |34|. In par¬ 
ticular, strict up-down alternation being the best possible strategy explains why heuristics for up-down run- 
generation can lead to better performance in some cases. Moreover, our constant-factor competitive ratios 
with resource augmentation and lookahead may guide followup heuristics and practical speed-ups. 

We conclude with open problems. 

Can randomization help circumvent the lower bound of 2 on the competitive ratio of online algorithms 
(without resource augmentation)? We know that no randomized online algorithm can have a competitive ratio 
better than 3/2, but there is still a gap. What is the performance of the greedy offline algorithm compared to 
optimal? We show that greedy can as bad as 3/2 times optimal. Is there a matching upper bound? Can we 
design a polynomial, exact, algorithm for the offline run-generation problem? We find it intriguing that our 
attempts at an exact dynamic program requires maintaining too many buffer states to run in polynomial time. 
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A Appendix: Omitted proofs 


Proof of LemmaHl Without loss of generality, assume r and r' are increasing runs. Consider any time step 
t when elements from both r and r' are being written. Let Bt and be the buffer of r and r' at time step t 
respectively; let Ct be the set of elements in Bt that are eventually written to r, and be the set of elements 
in B[ that are eventually written to r'. We prove inductively that €[ C Ct- This implies that r covers r'. The 
base case is true as r and r’ start with the same buffer. Since we have C[ C Ct, we must show that (a) the 
element z written to r is not in and (b) the element e read into C[_^i must also be in Ct+i- 

Consider the elements z' and 2 ; written by r' and r, respectively, at time t + 1. We must have z' > z', thus 
either z = z', or 2 ; is never written to r' (either way it is not in 

Since e is in C[_^i, it is eventually written by r'; thus e > z'. Thus e > 2 ;, but that means e is eventually 
written by r. Since e was just read it is in Bt+i, thus e G Ct+\- □ 

Observation A. If A has just written an element e, and is writing a down (up) run, then A cannot write any 
element larger (smaller) than e in the same run. Similarly, if A has just written e, then A cannot write both 
an element larger than e and an element smaller than e in the same run. 

Proof of Theorem Hot Let Ii, the first M elements of the input, be /i = 1,2,..., M. We divide the rest of 
the input into segments of size M. Let the (t + l)st such segment be It+i- Then, 

It+i = {l + tM Z' M + tM) or (-(1 + tM) \ -(M + tM)) . 

Call this a positive segment and a negative segment respectively. At time M(t — 1) +1 we decide whether 
It+i is a positive or a negative segment based on A. 

Specifically, we choose f+i using either the direction of the run A is writing, or the value of the most 
recent element written. If A is writing a down run, It+i is a positive segment; if A is writing an up run, f+i is 
a negative segment. It may be that A has only written one element of a run (so A could turn this into either an 
up run or a down run). If this element was the smallest element in the buffer of A, f+i is a negative segment. 
Otherwise, f+i is a positive segment. 

First we show that A must write at least one new run for each Ip, thus R{S) > t. At least one run is 
required for A to write Ii, so for the remainder of the proof we assume t > 1. Consider time M(f — 2) + 1, 
when It begins. We assume that f is a positive segment—a mirroring argument works when f is a negative 
segment. Furthermore, note that the elements of It are the largest in the instance so far. 

There are two cases: A is currently writing a down run, or the initial element of a new run. 

Case 1: Algorithm A is currently writing a down run. Then the elements of f must be larger than any 
element in A’s down run. Thus A must use another run to write the elements of f by Observation [A| 

Case 2: Algorithm A is writing the initial element of a new run. By construction, the element written is 
not the smallest element in A’s buffer, but is smaller than all elements in f. Then A must spend one run to 
write the smallest element in its buffer, and another to write f. Thus, f causes A to write a run in addition to 
its current run by Observation [A| 

On the other hand, an offline algorithm can write l 2 i and l 2 i+i in one run. Assume that l 2 i is a positive 
segment—a mirroring argument works when l 2 i is a negative segment. If l 2 i+i is positive, both can be 
written using an up run. If / 2 i+i is negative, both can be written using a down run. Thus OPT is no more 
than l't/2]. □ 

Proof of Theorem [iTJ Our lower bound uses the same basic principles as Theorem [T^ We first show the 
lower bound with some repeated elements, then show how to perturb the elements to avoid repetitions. We 
generate a randomized input and show that any deterministic algorithm cannot perform better than 3/2 times 
OPT on that input. The theorem is then proven by Yao’s minimax principle. Yao’s minimax principle states 
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that the best expected performance of a randomized algorithm is at least as large as the expected performance 
of the best deterministic algorithm over a (known) distribution of inputs. (See, e.g., |351 for details.) 

As in Theorem [T^ we divide the input into segments of size 4M. Call these segments It for t = 
1, 2,..., [A^/4MJ. Note that this input is randomized: for each t, we pick one of two inputs, each with 
probability 1/2. We choose either 

1/ = (1 / M) 

If = (-2M, / -M - 1) 
or If = (-3M / -2M - 1) 

If = {-AM /> -3M- 1). 

(a negative segment) 


If = (1 / M) 

If = (2M, \ M + 1) 
If = (3M \ 2M + 1) 
If = {AM \ 3M + 1) 
(a positive segment) 


Let It = if o if o if o if . A positive or negative segment is chosen randomly for each It with probability 
1 / 2 . 

The optimal algorithm spends no more than one run per It, using an up run for a positive segment or a 
down run for a negative segment. 

We show that any deterministic algorithm requires at least one new run to write lf_i and if fort > 1. 
Further analysis shows that with probability 1 /2, any deterministic algorithm requires at least one run to write 
the remainder of if, if, and if. Note that one run is also required to write summing, this gives a total 
expected cost of (3/2)OPT. 

Consider a segment It, t>l. Once all of lf_^ has been read into its buffer, at least one element of lf_^ 
has been written. Once all of if has been read into its buffer, at least one element of lf_^ has been written. 
Finally, once all of if has been read into its buffer, at least one element x of if has been written. Applying 
Observation]^ at least one new run is required to write these three elements. 

Now we show that with probability 1 /2, an additional run is required to write if. Let x be the first element 
written by if (thus, the cost of writing x itself was handled in the above case—we show when an additional 
run is required). Note that the algorithm must choose an x before it sees any element of if (so it does now 
know if It is positive or negative). 

Let X ^ 1 and It be a positive segment. By Observation]^ an additional run is required to write both 1 
and any element of if. If 1 is not written, all of if cannot be stored in the buffer—but then, if and if cannot 
be written using one run. Similarly, let x = 1 and It be a negative segment. By Observation]^ an additional 
run is required to write both M and any element of lf\ otherwise if and if require an additional run to be 
written. 

Thus any deterministic algorithm cannot perform better than a 3/2-approximation. Applying Yao’s min¬ 
imax principle proves the theorem. 

Now we perturb the input to avoid duplicate elements. We multiply each element by \_N/AM\ , and add t 
to each element of It. In other words, we use a new segment 


I't = {It®[N/AM\)®t. 


Our arguments above only depended on the relative ordering of the elements, which is preserved by this 
perturbation. For example, assume It and It-i are both positive segments. Then all elements of if are less 
than all elements of lf_i, and all elements of if are greater than all elements of if. □ 

Proof of Theorem [l3l We will build S constructively. At each time t where greedy chooses an up or down 
run, we show that one of its choices leads to a run of length at least 5M/4. Since the greedy algorithm always 
picks the longer run at each decision time step, the run with length less than 5M/4 can never be part of its 
output. 

Consider any time step t where t < N — 5M/4. If t is larger than this value, the final run will have lengfh 
n — t. Lef fhe confenfs of buffer af lime f be cri,..., um- Lei I' be Ihe sequence of [M/2J elemenls of I 
arriving afler t. 
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Consider the run starting at cjm and continuing downwards; call this down run ri. Let r 2 be the up run 
starting at ai and continuing upwards. Any element of I' less than a\M/ 2 \ will be (eventually) written out 
to ri; any that are greater than (Tym/ 2 \ will be written out to r 2 . Every number must fall into one of these 
categories, so there must be at least [[M/2J /2] numbers added to the larger run. Each run at least includes 
the elements already in the buffer, so the larger run has length at least M + [[M/2J /2]. 

The last 5M/4 elements can be handled by greedy in at most 2 runs (since each trivially has length at 
least M). Thus the above applies to all but the last two runs of greedy. □ 

Proof of Lemma nil Let 5i be an output that writes ri initially and S 2 be an output that writes r 2 initially. 
Without loss of generality, suppose that ri is increasing and r 2 is decreasing. Let ri = ri(l), ri(2),..., ri(fe) 
and r 2 = ?'2(1), ^ 2 ( 2 ),..., r 2 {i)- The idea of the proof is to split these runs into two phases (a) elements of 
ri are smaller than the corresponding elements of r 2 and (b) when the elements of ri are greater than or equal 
to those of r 2 . During each of these two phases, we use the fact that incoming elements written by 5i have 
to be in the buffer of S 2 (and vice versa) to bound their length. We assume that both runs write exactly one 
element for each element they read in; this cannot affect the length of the runs. 

Let Bq be the original buffer, i.e., the first M elements of I. Let i be the transition point between the two 
phases mentioned above; in other words, ri(z + 1) > r 2 {i + 1) but ri{i) < r 2 {i). 

Divide ri into si and ti, where si is the first i elements of ri, and ti is the remainder of ri. We further 
divide si into sf, the elements of si that are in Bq, and Si, the elements of si that are not in Bq. Let tf be 
the elements of ti that are in Bq. Let /i be the set of elements in ri that are read in after Xj+i is written. Let 
ui be the set of elements not in ri that are read in before Xj+i is written. We define the corresponding sets 
for r 2 as well: sf, if, / 2 , r 2 , and U 2 - 

We can bound the size of several of these sets by M. Note that si cannot have more than M elements, 
since all must be stored in the buffer while S 2 is being written. Thus | sf | +1 sf | < M. Similarly, | sf | +1 sf | < 
M. We must also have |ni | < M and |tt 2 | < M. Einally, consider sf U if. Any element in sf must be read 
before time step i. Since si is disjoint from S 2 (by definition of i), all elements of sf must be in 5i’s buffer 
at time step i. All elements of tf must also be in 5i’s buffer at time step i, so |sf | + |tf | < M. 

Starting from time step i + 1, any new element e that is read in cannot be in both ri and r 2 . This means 
that all elements of /i must be in the buffer of S 2 until r 2 ends, and all elements of /2 must be in the buffer 
of 5i until ri ends. On the other hand, all elements of U 2 must eventually be a part of ri, and similarly for ui 
and r 2 . 

To begin, we show a weaker version of the lemma for runs of length 4M. We have |ri| < (|'sf| + |sf|) + 
(|^^ 2 |) + (Isf I + |ff I) + |/i| < 3M + |/i|. Then if |ri| > 4M, then |/i| > M. Since all elements of /i must 
be stored in the buffer of S 2 until r 2 ends, r 2 must end when the Mth element of /i is read in. Then we must 
have |r 2 | < |ri|. 

We have |/i| > M — \u 2 [, otherwise, |ri| < (|s^| + jsf |) + (Is^l + |ff |) + (|^^ 2 | + |/i|) < 3M. Consider 
the first M — |u 2 | — 1 elements read in after i that are eventually written to ri (this is a prefix of /i), call them 
f[. Since |/i| > M — \u 2 \, there must be another element e G /i that is read after all elements of /{. Note 
e G ri. Let t be the time when e arrives. 

At t, the buffer of S 2 must contain all elements of U 2 , as well as all elements of f[ and e. The buffer of 
S 2 is then full of elements that cannot be written in r 2 . Hence, S 2 is forced to start a new run at time t < |ri |, 
so |r 2 | < |ri|. Then we must have I/ 2 I + |ui| < M, none of the elements in these sets are in ri, and must be 
stored in Si’s buffer until ri ends (which is after r 2 ends). Einally, we have, 

^ 2 ! < (Is^l + I) + I) + (|ui| + I/ 2 I) < 3M , 


as required. 

□ 
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Proof of Theorem [I 3 Algorithm A will simulate the maximal up run, ri, and maximal down run, r 2 , to see 
which is longer, but it does not actually need to write any elements during this simulation. By Lemma [T4| if 
we find that one run has length at least 3M, it must be the longer run. 

We now describe exactly how to simulate a run r using 4M space without writing any elements. Algo¬ 
rithm A simulates the run one step at a time. We describe the actions and buffer of an algorithm with M -size 
buffer writing r as the simulated algorithm. Without loss of generality, assume r is an up run. 

Assume that all elements are stored in the buffer in the order they arrive. Thus, after t elements have been 
written, the first M + t elements of the buffer are exactly the elements the simulated algorithm has read from 
the input up to time t. Of these elements, M must be in the buffer of the simulated algorithm, while the other 
t will have been written to r; however, A does not explicitly keep track of which elements are in the buffer. 

The algorithm A keeps track of i, the last element written to r, because at each t, all of the first M + t 
elements larger than i are: (a) in the simulated algorithm’s buffer at time t and (b) will be written to r at a 
later point. Thus, once no item in the first M + t elements is larger than £, r must end. At each time step, the 
smallest element larger than £ is written to r. 

Specifically, af lime sfep t, A finds fhe smallesf elemenf e in fhe firsf M + t elemenfs of fhe buffer fhaf is 
larger fhan i. This is fhe nexf elemenf of r. Thus in fhe nexf time sfep, A updafes i ^ e, and repeafs. If no 
such e can be found, no elemenf in M -|- f (and fhus no elemenf in fhe buffer) can confinue fhe run, so fhe run 
ends af fime t. 

The lasf fime A can updafe £ is when fhe simulafed algorifhm has seen all elemenfs in fhe buffer; in ofher 
words, t = 3M. By Lemma[^ fhis is sufficienf fo defermine which run is longer. 

The algorifhm now knows which run is longer; wifhouf loss of generalify, assume |ri| > |r 2 |. Then fhe 
algorifhm wrifes a maximal run r using ifs 4M-size buffer in fhe direcfion of ri. Run r is guaranteed fo 
confain all elemenfs of ri by LemmaSince r 2 has lengfh less fhan 3M by Lemma 14 all of ifs elemenfs 
musf already be in fhe 4M-size buffer. Thus fhey are wriffen during r because a maximal run always wrifes 
ifs buffer confenfs. The firsf inifial run of a proper opfimal algorifhm on fhe unwriffen-elemenf sequence has 
fo be eifher ri or r 2 . Since r covers bofh ri and r 2 , by Theorem |7]wifh (3 = 1, A never wrifes more runs fhan 
an opfimal algorifhm wifh an M -size buffer. □ 


Lemma A, Consider two algorithms Ai and A 2 that have the same remaining input I when they both start 
writing a new maximal run, called ri and r 2 . Let their buffers at this point be Bi and B 2 (that may not be 
full) respectively, and assume max(i?i \ B 2 ) < min(i ?2 \ ^i)- Iffi tind r 2 are increasing then all elements 
in I written to r 2 are also written to ri. Similarly, ifri and r 2 are decreasing, all elements in I written to ri 
are also written to r 2 . 


Proof If suffices fo prove fhe firsf case where ri and r 2 are bofh increasing as fhe ofher case can be proven 
similarly. After rfft) and r 2 {t) were wriffen, lef Cfft) and C 2 {t) be fhe sef of elemenfs in fhe buffers of Ai 
and A 2 fhaf will be wriffen in ri and r 2 af some poinf in fhe fufure, i.e., fhe sef of elemenfs fhaf are af leasf as 
large as rfft) or r 2 {t) respectively. 

If is easy fo prove by inducfion fhaf fhe invarianf 


max(Ci(f) \ C 2 {t)) < min(C' 2 (f) \ C'i(t)) 

always holds. We note fhaf fhis invarianf implies ri (t -|- 1) < r 2 {t + l). Therefore, if fhis invarianf is frue for 
all t < min{|ri|, |r- 2 |}, any incoming elemenf e G I satisfies e G r 2 => e G ri as required. We now prove fhe 
invarianf: 

The base case is frue since C'i(O) = Bi, (72(0) = B 2 . Suppose fhe invarianf holds for t, fhen rfft -f 1) < 
r 2 {t + 1) and a new elemenf e is read in. 

Case 1: if e G C 2 {t -f 1) ^ e > r 2 {t -|- 1) > 'ri(f -f 1) ^ e G Ci{t + 1). 

Case 2: if e G Ci{t -f 1) and e ^ € 2(1 + 1), then e < r 2 {t) = min((72(f)) < min((72(f + 1)). 
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Hence, the invariant holds for i + 1. 

□ 

Observation B. On an input I, let ri o ... o be the first k runs of an optimal output. If r'^^o ... o r'^be k 
runs that cover ri o ... o r^. Then, r[o ... o r'j, are also the first k runs of an optimal output. 

Proof of Lemma [m Without loss of generality, assume ri and r 2 are initial maximal increasing and de¬ 
creasing runs respectively and |ri| > |r 2 |. Suppose r 2 o r^, where rs is an increasing, is prefix of an optimal 
output 5 opt(-^)- 

Consider the case |ri| = |r 2 |. Let their buffers at the end of these two runs be Bi, B 2 and let j be the 
smallest index such that ri(j -|- 1) > r 2 {j + 1). Consider any new element e G / that is read in before 
ri{j + 1) is written. Obviously, we have: 


e G Hi => e < ri{j), 
e G H 2 => e > r 2 {j). 

Consider any new element e which is read in after ri(j -|- 1), r 2 (j -|- 1) were written. It is easy to see the 
followings: 


e £ Bi \ B 2 ^ e < ri{j + 1), 
e G H 2 \ Hi =► e > r 2 {j + 1). 

Therefore, max(Hi \ H 2 ) < max(ri(j), r 2 (j + 1)) < min(r 2 (j), ri(j + 1)) < min(H 2 \ Hi). The situation 
can be visualized in Figure as follows. If an incoming element cannot be written in the current run, it lies 
below or above (depending on whether the run is increasing or decreasing) the last element written. The 
regions are marked with their associated sets described above. 



Figure 2: Visualizing the buffer states. 

Consider ri o r^, where r 4 is a maximal increasing run. Every elements in r 2 will be written in either ri 
or r 4 by Lemma If e G rs, then we consider the cases where e G H 2 or e G ra \ H 2 . If e G H 2 , then e is 
either in ri or in Hi which means e is either in ri or r 4 . If e G ra \ H 2 , e G r 4 by Lemma [A| using the fact 
that max(Hi \ H 2 ) < min(H 2 \ Hi). Thus, ri o r 4 covers r 2 o rs. As a result, ri is also a prefix of an optimal 
output 5 opt(-^)^ by Observation [ b| 

If |ri| > |r 2 | and |r 2 | = k. Then the simplest argument goes as follows. Instead of arguing based 
on ri directly, we consider r j that is increasing but may not be maximal. Consider an algorithm Af that 
writes r/(l) = ?’i(l), ■ • • ,rf{k) = ri{k). Then, without reading any new element in, it finishes its first 
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run rj by writing out all elements in its buffer that are larger than ri(fc) (the set of these elements is H in 
Figure]^. After this extra step, let the buffer be Bf. Use the exact same argument as above, we have that 
max(i?j\i? 2 ) < r 2 (j + l)) < min(r 2 (j),r/-(j + l)) < m.m{B 2 \Bf). Using the same argument 

as in the first case, we have that rj followed by a maximal increasing run will cover r 2 followed by a maximal 
increasing run. Hence, ry is a prefix of an optimal oufpuf. Since ri covers ry, if is also a prefix of an optimal 
oufpuf by Observation]^. □ 


Proof of Theorem [l9l In any parfifion fhaf is nol fhe lasf one, lef be fhe unwriffen-elemenf sequence and 
lef ri, r 2 be fhe fwo possible maximal inifial runs where ri is increasing and r 2 is decreasing. Withouf loss 
of generalify, suppose |ri| > |r 2 |. We use fhe simulation fechnique of Theorem 17 fo determine which run is 
longer. 

The algorifhm writes ri o rs o r 4 where rs and r 4 are maximal runs that have the same and opposite 
directions as ri respectively. The algorithm stops when there is no element left to write. We break our 
analysis into cases based on what runs are in an optimal output. In each case, we show that TheoremjVjproves 
a competitive ratio of /3 = 3/2. 

If r 2 is a prefix of a proper optimal oufpuf Sop^ilr), let ?’5 be the maximal run after r 2 in S'oPT(^r)- 
After wrifing r^, fhe algorifhm already writes ouf all elemenfs in r 2 by LemmaLef fhe unwriffen-elemenf 
sequence affer wrifing be I 3 and lef the unwritten-element sequence after writing r 2 be I 2 . By Lemma 
Is is a subsequence of I 2 . According to Lemma 16 rs has to be decreasing in order to possibly have fewer 
runs than writing ri initially. Hence, applying Lemma to r 4 and r^, we know that at the end of r 4 , the 
algorithm has written all elements of r 2 and r^. Thus, no r^o r 4 covers r 2 o rs. 

If ri o rs is a prefix of Sop'p{Ir), then we are done as ri o rs o r 4 trivially covers ri o r^. 

If ri is a prefix of 5 opt(^j?) buf ri o is nof a prefix of S'opt(-I^r)- Then, lef be the opposite maximal 
run to rs, i.e., ri, rg are the first two runs of SoptC-^r)- We have I 3 is a subsequence of Ii. Hence, applying 
Lemmaj^to rg on input Ii and r 4 on input Is, we have that at the end of r 4 , the algorithm has written out all 
elements in ri o rg. Thus, ri o rs o r 4 covers ri o rg. 

In the last partition, since A outputs at most 3 runs, it can only achieve a ratio worse than 3/2 if the 
optimal algorithm wrote out a single run. But then that run is longer, and A would choose it. Therefore, we 
have 72(5) < (3/2) • OPT. □ 


Proof of Theorem I 20 I In each partition, let the unwritten-element sequence be B and the optimal proper 
output of Ir be 5Qpq'(7r). The algorithm randomly picks the direction of the next run and writes a maximal 
runs in that direction using M-size buffer. It uses the extra M buffer slots to simulate the buffer state of 
the maximal run in the other direction to check if the run it chose is at least as long as the other run. If the 
algorithm picked the run that is at least as long as the other run, it then writes a maximal run in the same 
direction followed by another maximal run in the opposite direction. The algorithm stops when there is no 
more element to write. In the proof of Theorem [T9| we showed that these three runs will cover the first two 
runs of 5Qp'p(7r). 

If the algorithm picked the shorter run, then it writes three more maximal runs with alternating directions. 
We know that the first two runs with alternating directions cover the first run of 5Qpq'(7r) as argued in the 
proof of Theorem]^ hence, the next two runs with alternating directions cover the second run of 5Qpq'(7r.). 

In the last partition, if OPT(7,.) = 2, the analysis is the same. If OPT(7r) = 1, then the optimal output 
must be the longer maximal run. The algorithm, if picked the shorter run, then will cover the longer run when 
it writes the next maximal run in the opposite direction as showed in the proof of Theorem]^ Therefore, we 
haveE[xi]/yi < l/2.(4/2) + l/2.(3/2) = 7/4. 

Applying Theoremj^with a = 1.75 and/3 = 2, we have: E[72(5)] < (7/4)OPT and 72(5) = 20PT. □ 

Proof of Theorem I 2 II Suppose an algorithm has (4M — 3)-size buffer. Consider the input 7i o e o 72 where 
7i = (1 / M - 1) o (2M - 1 \ M) o (3M / 4M - 2) o (-M \ -2M -f 2). 
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If S first writes —2M + 2, then let e = —2M + 1. 

• Case 1: if S writes e = —2M + 1 next, then let /2 = (0 \ —(M — 1)). Thus, S has to spend at least two 
runs while an optimal output is one run: (4M — 2 \ 3M) o (2M — 1 \ —2M + 1). 

• Case 2: if S writes —2M + 3 next, let I 2 = (—2M \ — lOM) o (2M 3M — 1). Then S has to spend at 
least 3 runs while an optimal output has 2runs: {AM —2 \ 3M)o(2M —2 \ —10M)o(2M ^ 3M —1). 

Similarly, if S first writes AM — 2, then let e = AM — 1. 

• Case 1: if S writes e = AM — 1 next, then let I 2 = (2M ^ 3M — 1). 

• Case 2: if S writes AM — 3 next, let I 2 = {AM lOM) o (0 \ —M + 1). 

If S first writes e' ^ {—2M + 2, AM — 2}, then let e = —2M + l ,/2 = (0 \ —(M — 1)). Thus, S 
has to spend at least two runs while an optimal output has the following output with one run: {AM — 2 \ 
3M)o (2M-1 \-2M + 1). □ 

Proof of Theorem I 22 I We apply Theorem [t] with x = ([l/e]+l) and y = [1/e]. In any partition except 
the last one, the algorithm chooses the combination of [1/e] maximal runs ri o • • • o rj-i/gi whose output is 
longest (ties are broken arbitrarily) and writes out one extra run By Lemmaj^ ri o • • • o 

covers the first [1/e] runs of a proper optimal output of the unwritten-element sequence in (1 + [1/e]) 
runs. In the last partition, the algorithm chooses a combination of runs with the smallest number of runs. 
Therefore, we obtain an/3 = l + l/[l/e] <l + e approximation. 

There are combinations to consider (each run can be up or down). The length of a run can be 

calculated in 0{Ni) time by simulating it directly, where Ni is the length of the longest output, namely, 
|ri o • • • o Since Ni items are then written out, the total running time is 0(X]i=i = 

0{N2^/^). Searching for the shortest way to write out the remaining elements (once Ir = 0) takes 0{N2^/^) 
time, which does not affect the running time. □ 

Proof of Theorem |23l In each partition, we restrict the search for the combination of (l/[e]) consecutive 
runs that writes the longest sequence as described above. By Lemma [T6| if d runs remain to be written out, 
we must examine one subcase with d — 1 runs remaining, and one with d — 2 runs remaining. Thus, the 
number of combinations we need to consider is = Fd-i + Fd- 2 - Therefore, the running time of this step 
is 0(F|-i/£i Ni log Ni). Thus, we have 

0 {F^y,-^N,\ogNi) = O(ypriAliVaogiV0. 

This is because T'i'i/e] = )/\/5 < . □ 

Proof of Theorem|24l At each decision time step, A flips a coin to pick a direction for the next run r. It 
begins writing an up or down run according to the coin flip. 

Meanwhile, A uses M additional space to simulate r', the run in the opposite direction. In particular, it 
simulates the contents of the buffer at each time step, as well as the last element written. Note that A does not 
need to keep track of the most recent element read when simulating r', as it is always the last element in the 
buffer. 

By Lemma [T4l the run with the incorrect direction has length less than 3M and the run with the correct 
direction has length of 3M or more. Thus, A can tell if it picked the correct direction. With probability 1/2, 
A writes the longer run. Therefore, it knows it made the correct direction and repeats, flipping another coin. 

Now consider the case where A picks the wrong direction. When r ends (at time t), r' is continuing. Then 
A must act exactly as if it had written r'. Specifically, we cannot simply cover r' and use an argument akin to 
Theorem[7| as then the unwritten-element sequence may not be 3-nearly-sorted. 

To simulate r', A has two tasks: (a) A must write all elements that were written by r' that were not 
written by r, and (b) A must “undo” writing any element that was written during r that is not in r', in case 
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these elements are required to make a subsequent run have length 3M. Divide the buffer into two halves: Br 
is the buffer after writing r, and Br> is the buffer being simulated when r ends. 

The first task is to ensure that A writes all elements written by r' that were not written by r in the direction 
of r'. These elements must be in Br since they were not written by r; and they must not be in Br' since they 
were written by r'. Thus, A can simply write out each element in Br that is not in Br' and continue writing 
r' from that time step. 

The second task is to ensure that all elements written during r that were not written during r' cannot affect 
future run lengths. These elements must be in Br' but not in Br- We mark these elements as special ghost 
elements. We can do this with 0(1) additional space by moving them to the front of the buffer and keeping 
track of how many of them there are. During subsequent runs, these are considered to be a part of ^’s buffer. 
However, when A would normally want to write one of these elements out, it instead simply deletes it from 
its buffer without writing any element. That said, A still counts these deletions towards the size of the run. 
Note that our buffer never overflows, as A continues to write (or delete) one element per time step. 

When this simulation is finished, the contents of j4’s buffer are exactly what they would have been had 
it written out r' in the first place—^however some are ghost elements, and will be deleted instead of written. 
Then A repeats, flipping another coin. 

For each run in the optimal output, either: A writes that run exactly for cost 1 (with probability 1 /2), 
or A writes another run, and makes up for its mistake by simulating the correct run exactly, for cost 2 (with 
probability 1/2). Thus A has expected cost (3/2)OPT. In the worst case, A guesses incorrectly each time 
for a total cost of 20PT. 

□ 


Proof of LemmaHH Suppose we are at a decision time step. Without loss of generality, assume this time 
step to be 0. Let the next two possible maximal runs be ri and r 2 that are up and down respectively. Without 
loss of generality, suppose |ri| > 5M. Let rs be the maximal decreasing run that follows r 2 . By Lemma [T^ 
either writing ri or writing r 2 o rs is optimal on the unwritten-element sequence. Call the two outputs 5i and 
S 2 respectively. 

Let ri = 


Let sf,, 




qB qN +b 


) f 2 > /a) ^2 be the same sets described in the proof of Lemma 
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(xi,..., Xk),r2 = (yi,..., yi),rz = [yi+i, • • •, yq)- Let the buffers of Si and S2 after time step f ^ be 
Bi/, B2/. Let j > £ be the smallest such that Xj+i > yj+i- 

we let ri^2 = {xi+i^ ■ ■ ■ j ti^2 = {xj+i ,..., x^}- Let S12 be 


Similar to the proof of Lemma 
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the set of elements in ri 2 but not in Bi i. Let 3^2 be the set of elements in ri 2 and also in Bi i \ r 2 . Let 2 


be the set of elements in 2 and also in Hi £ \ r 2 . Let ui 2 be the set of elements not in ri and read in before 
Xj+i is written. Let /i 2 be the set of elements in ri and read in after Xj+i is written. 

We define S 3 = {y£+i,..., y^} and = {yj+i,..., yq}. Let be the set of elements in S 3 but not in 


B. 


2,t' 


Let be the set of elements in S 3 n B 2 /. Let be the set of elements that are not in and read in 


before yj+i is written. Let be the set of elements in and read in after y^+i is written. 

Since the buffer of S 2 must keep all elements in Si^ 2 ) ^12 bme step j + 1, we have 
We have. 


Si,2 


+ 


\Sl,2\ < M. 


^’ll — ^ 2 ! + {\u3\ + {SsD + (|'Sl[ 2 l + l'Sl! 2 l) + 1 / 1 , 2 ! 

< ( 2 M + \ui\ + I/ 2 I) + (M - |ui| - I/ 2 I) + M + 1 / 1,2 
= 4M+|/i,2|. 


Since |/i, 2 | > M because of our assumption |ri| > 5M, rs has to end before ri using the same argument as 
in the proof of Lemma[l4| Since |ri| > |r 2 or 3 |, ri followed by any maximal run will cover r 2 or 3 . Therefore, 
ri is an optimal prefix of the unwritten-element sequence. At every time step, the maximal run of length 5M 
or more is always a prefix of an optimal output on the unwritten-element sequence as required. □ 
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